Clustering via Random Walk Hitting Time on Directed Graphs

نویسندگان

  • Mo Chen
  • Jianzhuang Liu
  • Xiaoou Tang
چکیده

In this paper, we present a general data clustering algorithm which is based on the asymmetric pairwise measure of Markov random walk hitting time on directed graphs. Unlike traditional graph based clustering methods, we do not explicitly calculate the pairwise similarities between points. Instead, we form a transition matrix of Markov random walk on a directed graph directly from the data. Our algorithm constructs the probabilistic relations of dependence between local sample pairs by studying the local distributions of the data. Such dependence relations are asymmetric, which is a more general measure of pairwise relations than the similarity measures in traditional undirected graph based methods in that it considers both the local density and geometry of the data. The probabilistic relations of the data naturally result in a transition matrix of Markov random walk. Based on the random walk viewpoint, we compute the expected hitting time for all sample pairs, which explores the global information of the structure of the underlying directed graph. An asymmetric measure based clustering algorithm, called K-destinations, is proposed for partitioning the nodes of the directed graph into disjoint sets. By utilizing the local distribution information of the data and the global structure information of the directed graph, our method is able to conquer some limitations of traditional pairwise similarity based methods. Experimental results are provided to validate the effectiveness of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Slow, or Fast, Are Standard Random Walks? - Analyses of Hitting and Cover Times on Tree

Random walk is a powerful tool, not only for modeling, but also for practical use such as the Internet crawlers. Standard random walks on graphs have been well studied; It is well-known that both hitting time and cover time of a standard random walk are bounded by O(n) for any graph with n vertices, besides the bound is tight for some graphs. Ikeda et al. (2003) provided “β-random walk,” which ...

متن کامل

Finding hitting times in various graphs

The hitting time, huv , of a random walk on a finite graph G, is the expected time for the walk to reach vertex v given that it started at vertex u. We present two methods of calculating the hitting time between vertices of finite graphs, along with applications to specific classes of graphs, including grids, trees, and the ’tadpole’ graphs. keywords: random walks, hitting time

متن کامل

Graph embedding using a quasi-quantum analogue of the hitting times of continuous time quantum walks

In this paper, we explore analytically and experimentally a quasi-quantum analogue of the hitting time of the continuous-time quantum walk on a graph. For the classical random walk, the hitting time has been shown to be robust to errors in edge weight structure and to lead to spectral clustering algorithms with improved performance. Our analysis shows that the quasi-quantum analogue of the hitt...

متن کامل

Quantum Random Walks Hit Exponentially Faster

We show that the hitting time of the discrete time quantum random walk on the n-bit hypercube from one corner to its opposite is polynomial in n. This gives the first exponential quantum-classical gap in the hitting time of discrete quantum random walks. We provide the framework for quantum hitting time and give two alternative definitions to set the ground for its study on general graphs. We t...

متن کامل

From random walks to distances on unweighted graphs

Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008